智能论文笔记

On the cross-lingual transferability of multilingual prototypical models across NLU tasks

Oralie Cattan , Christophe Servan , Sophie Rosset

分类：自然语言处理

2022-07-19

有监督的基于深度学习的方法已应用于以任务为导向的对话框，并在有足够数量的培训示例可用时对有限的域和语言应用有效。在实践中，这些方法遭受了域驱动设计和资源不足的语言的缺点。域和语言模型应该随着问题空间的发展而增长和变化。一方面，对转移学习的研究证明了基于多语言变压器模型学习语义丰富的表示的跨语性能力。另一方面，除了上述方法之外，元学习还能够开发任务和语言学习算法，能够实现泛滥。在这种情况下，本文提出了使用典型的神经网络和基于多语言变压器的模型来研究使用协同进行几次学习的跨语性可传递性。自然语言的实验理解多亚提斯++语料库的任务表明，我们的方法基本上改善了低资源和高资源语言之间观察到的转移学习表现。更普遍地说，我们的方法证实，可以将具有特定语言的有意义的潜在空间推广到使用元学习的情况下看不见和资源不足的潜在空间。

translated by 谷歌翻译

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

Oralie Cattan , Sahar Ghannay , Christophe Servan , Sophie Rosset

分类：自然语言处理 | 人工智能

2022-07-19

在过去的五年中，基于自动变压器的体系结构的兴起导致了许多自然语言任务的最新表现。尽管这些方法越来越受欢迎，但它们需要大量的数据和计算资源。在数据范围的应用程序条件下，在资源不足的语言上，基准测试方法仍然非常需要对方法进行基准测试。大多数预训练的语言模型都使用英语进行了大规模研究，其中只有少数在法语上进行了评估。在本文中，我们提出了一个统一的基准测试，重点是评估模型质量及其对两个法语口语理解任务的生态影响。尤其是我们基于13个完善的基于变压器的模型基于法语的两个可用语言理解任务：媒体和ATIS-FR。在此框架内，我们表明紧凑的模型可以与较大的模型达到可比的结果，而生态影响却大大降低。但是，此假设是细微的，取决于考虑的压缩方法。

translated by 谷歌翻译

On the Usability of Transformers-based models for a French Question-Answering task

Oralie Cattan , Christophe Servan , Sophie Rosset

分类：自然语言处理 | 人工智能

2022-07-19

对于许多任务，基于变压器的体系结构已经实现了最新的结果，从而导致实践从使用特定于任务的架构到预先训练的语言模型的微调。持续的趋势包括具有越来越多的数据和参数的培训模型，这需要大量资源。它导致了强有力的搜索，以提高基于仅针对英语评估的算法和硬件改进的算法和硬件改进。这引发了有关其可用性的疑问，当应用于小规模的学习问题时，对于资源不足的语言任务，有限的培训数据可用。缺乏适当尺寸的语料库是应用数据驱动和转移学习的方法的障碍。在本文中，我们建立了致力于基于变压器模型的可用性的最新努力，并建议评估这些改进的法语表现，而法语的效果很少。我们通过通过数据增强，超参数优化和跨语性转移来调查各种培训策略来解决与数据稀缺有关的不稳定。我们还为法国弗拉伯特（Fralbert）引入了一种新的紧凑型模型，该模型在低资源环境中被证明具有竞争力。

translated by 谷歌翻译

A Machine Learning Case Study for AI-empowered echocardiography of Intensive Care Unit Patients in low- and middle-income countries

Xochicale Miguel , Thwaites Louise , Yacoub Sophie , Pisani Luigi , Tran Huy Nhat Phung , Kerdegari Hamideh , King Andrew , Gomez Alberto

分类：机器学习

2022-12-30

We present a Machine Learning (ML) study case to illustrate the challenges of clinical translation for a real-time AI-empowered echocardiography system with data of ICU patients in LMICs. Such ML case study includes data preparation, curation and labelling from 2D Ultrasound videos of 31 ICU patients in LMICs and model selection, validation and deployment of three thinner neural networks to classify apical four-chamber view. Results of the ML heuristics showed the promising implementation, validation and application of thinner networks to classify 4CV with limited datasets. We conclude this work mentioning the need for (a) datasets to improve diversity of demographics, diseases, and (b) the need of further investigations of thinner models to be run and implemented in low-cost hardware to be clinically translated in the ICU in LMICs. The code and other resources to reproduce this work are available at https://github.com/vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries.

translated by 谷歌翻译

A Fabric Soft Robotic Exoskeleton with Novel Elastic Band Integrated Actuators for Hand Rehabilitation

Cem Suulker , Sophie Skach , Kaspar Althoefer

分类：机器人

2022-12-14

Common disabilities like stroke and spinal cord injuries may cause loss of motor function in hands. They can be treated with robot assisted rehabilitation techniques, like continuously opening and closing the hand with help of a robot, in a cheaper, and less time consuming manner than traditional methods. Hand exoskeletons are developed to assist rehabilitation, but their bulky nature brings with it certain challenges. As soft robots use elastomeric and fabric elements rather than heavy links, and operate with pneumatic, hydraulic or tendon based rather than traditional rotary or linear motors, soft hand exoskeletons are deemed a better option in relation to rehabilitation.

translated by 谷歌翻译

MIST: a Large-Scale Annotated Resource and Neural Models for Functions of Modal Verbs in English Scientific Text

Sophie Henning , Nicole Macher , Stefan Grünewald , Annemarie Friedrich

分类：自然语言处理 | 人工智能

2022-12-14

Modal verbs (e.g., "can", "should", or "must") occur highly frequently in scientific articles. Decoding their function is not straightforward: they are often used for hedging, but they may also denote abilities and restrictions. Understanding their meaning is important for various NLP tasks such as writing assistance or accurate information extraction from scientific text. To foster research on the usage of modals in this genre, we introduce the MIST (Modals In Scientific Text) dataset, which contains 3737 modal instances in five scientific domains annotated for their semantic, pragmatic, or rhetorical function. We systematically evaluate a set of competitive neural architectures on MIST. Transfer experiments reveal that leveraging non-scientific data is of limited benefit for modeling the distinctions in MIST. Our corpus analysis provides evidence that scientific communities differ in their usage of modal verbs, yet, classifiers trained on scientific data generalize to some extent to unseen scientific domains.

translated by 谷歌翻译

An Algebraic Framework for Stock & Flow Diagrams and Dynamical Systems Using Category Theory

John C. Baez , Xiaoyan Li , Sophie Libkind , Nathaniel D. Osgood , Eric Redekopp

分类：自然语言处理

2022-11-01

Stock and flow diagrams are already an important tool in epidemiology, but category theory lets us go further and treat these diagrams as mathematical entities in their own right. In this chapter we use communicable disease models created with our software, StockFlow.jl, to explain the benefits of the categorical approach. We first explain the category of stock-flow diagrams, and note the clear separation between the syntax of these diagrams and their semantics, demonstrating three examples of semantics already implemented in the software: ODEs, causal loop diagrams, and system structure diagrams. We then turn to two methods for building large stock-flow diagrams from smaller ones in a modular fashion: composition and stratification. Finally, we introduce the open-source ModelCollab software for diagram-based collaborative modeling. The graphical user interface of this web-based software lets modelers take advantage of the ideas discussed here without any knowledge of their categorical foundations.

translated by 谷歌翻译

What Are You Anxious About? Examining Subjects of Anxiety during the COVID-19 Pandemic

Lucia L. Chen , Steven R. Wilson , Sophie Lohmann , Daniela V. Negraia

分类：自然语言处理

2022-09-27

Covid-19在大流行的不同阶段对公众构成了不成比例的心理健康后果。我们使用一种计算方法来捕获引发在线社区对大流行的焦虑的特定方面，并研究这些方面如何随时间变化。首先，我们使用主题分析在R/covid19 \ _support的Reddit帖子样本（$ n $ = 86）中确定了九个焦虑（SOA）。然后，我们通过在手动注释的样本（$ n $ = 793）上训练Reddit用户的焦虑来自动将SOA标记在较大的年代样本中（$ n $ = 6,535）。 9个SOA与最近开发的大流行焦虑测量量表中的项目保持一致。我们观察到，在大流行的前八个月，Reddit用户对健康风险的担忧仍然很高。尽管案件激增稍后发生，但这些担忧却大大减少了。通常，随着大流行的进展，用户的语言披露了SOA的强烈强度。但是，在本研究涵盖的整个期间，人们对心理健康的担忧和未来稳步增长。人们还倾向于使用更强烈的语言来描述心理健康问题，而不是健康风险或死亡问题。我们的结果表明，尽管Covid-19逐渐削弱，但由于适当的对策而逐渐削弱了作为健康威胁，但该在线小组的心理健康状况并不一定会改善。我们的系统为人口健康和流行病学学者奠定了基础，以及时检查引起大流行焦虑的方面。

translated by 谷歌翻译

Evaluation of Medical Image Segmentation Models for Uncertain, Small or Empty Reference Annotations

Sophie Ostmeier , Brian Axelrod , Jeroen Bertels , Fabian Isensee , Maarten G. Lansberg , Soren Christensen , Gregory W. Albers , Li-Jia Li , Jeremy J. Heit

分类：计算机视觉 | 机器学习

2022-09-26

医学图像分割模型的性能指标用于衡量参考注释和预测之间的一致性。在开发此类模型中，使用了一组通用指标，以使结果更具可比性。但是，公共数据集中的分布与临床实践中遇到的案例之间存在不匹配。许多常见的指标无法衡量这种不匹配的影响，尤其是对于包含不确定，小或空参考注释的临床数据集。因此，可能无法通过此类指标来验证模型在临床上有意义的一致性。评估临床价值的维度包括独立于参考注释量的大小，考虑参考注释的不确定性，体积计和/或位置一致性的奖励以及对空参考注释正确分类的奖励。与普通的公共数据集不同，我们的内部数据集更具代表性。它包含不确定的，小或空的参考注释。我们研究了有关深度学习框架的预测的公开度量指标，以确定哪些设置共同指标可提供有意义的结果。我们将公共基准数据集进行比较而没有不确定，小或空参考注释。该代码将发布。

translated by 谷歌翻译

Random graph matching at Otter's threshold via counting chandeliers

Cheng Mao , Yihong Wu , Jiaming Xu , Sophie H. Yu

分类： (统计)机器学习

2022-09-25

我们根据计算一个扎根于每个顶点的某个加权树的家族而构成的相似性得分提出了一种有效的图形匹配算法。对于两个erd \ h {o} s-r \'enyi图$ \ mathcal {g}（n，q）$，其边缘通过潜在顶点通信相关联，我们表明该算法正确地匹配了所有范围的范围，除了所有的vertices分数外，有了很高的概率，前提是$ nq \ to \ infty $，而边缘相关系数$ \ rho $满足$ \ rho^2> \ alpha \ ailpha \大约0.338 $，其中$ \ alpha $是Otter的树木计数常数。此外，在理论上是必需的额外条件下，可以精确地匹配。这是第一个以显式常数相关性成功的多项式图匹配算法，并适用于稀疏和密集图。相比之下，以前的方法要么需要$ \ rho = 1-o（1）$，要么仅限于稀疏图。该算法的症结是一个经过精心策划的植根树的家族，称为吊灯，它可以有效地从同一树的计数中提取图形相关性，同时抑制不同树木之间的不良相关性。

translated by 谷歌翻译